1,452 research outputs found

    Examination and utilization of rare features in text classification of injury narratives

    Get PDF
    Thanks to the advances in computing and information technology, analyzing injury surveillance data with statistical machine learning methods has grown in popularity, complexity, and quality over recent years. During that same time, researchers have recognized the limitations of statistical text analysis with limited training data. In response to the two primary challenges for statistical text analysis, dimensionality reduction and sparse data, many studies have focused on improving machine learning algorithms. Less research has been done, though, to examine and improve statistical machine learning methods in text classification from a linguistic perspective. This study addresses this research gap by examining the importance of extreme-frequency words in classifying injury narratives. The results indicate that adhering to the common practice of removing frequently-occurring prepositions from the text significantly decreased the classification performance for certain categories. Removing low-frequency words significantly improved the classification performance for Multinomial Naive Bayes (MNB), helped alleviate the problem of overfitting small categories for Logistical Regression (LR), but did not have any significant effect for Support Vector Machine (SVM). As a way to utilize low-frequency words, classic word normalization or grouping methods such as stemming and lemmatization are often used in the text preprocessing stage. Despite their popularity, these classic grouping methods are not without limitations. The proposed Type M+S Word Grouping Method groups rare and unseen words morphologically and semantically automatically using unlabeled data. Several experiments were conducted for evaluating the grouping effect for three classifiers (MNB, SVM, LR) in three train-test scenarios (1:9, 1:1, 9:1) on injury surveillance data with a half-million narratives classified into 30 external cause categories. The experimental results show that the proposed method optionally paired with three add-on methods (two-word sequence tagging, reviewed tagging, Naive Bayes-weighted classifier) resulted in better classification performance as compared to stemming and lemmatization. The overall classification performance for small categories with limited training data was improved for MNB (5.5%), SVM (4%), and LR (11.2%) to an extent comparable to increasing the size of the labeled training set by a factor of 3.6 for MNB, 2.3 for SVM, and 5.2 for LR. Some improvement was also observed for medium-sized categories (1.7%) while performance on large categories remained nearly unchanged (0.1%). The overall results advance the conclusion that the proposed method of decision support is a promising approach for incorporating expert knowledge that improves machine learning for classifying injury narratives with reduced manual effort. The results also suggest that simply increasing the size of a training dataset would not result in the level of performance that the proposed method can achieve because of the inherent limitations of linear classifiers to acquire fundamental concepts and classification rules from the narrative that human experts know by definitions of injuries

    Regulation of CLC-1 chloride channel biosynthesis by FKBP8 and Hsp90β.

    Get PDF
    Mutations in human CLC-1 chloride channel are associated with the skeletal muscle disorder myotonia congenita. The disease-causing mutant A531V manifests enhanced proteasomal degradation of CLC-1. We recently found that CLC-1 degradation is mediated by cullin 4 ubiquitin ligase complex. It is currently unclear how quality control and protein degradation systems coordinate with each other to process the biosynthesis of CLC-1. Herein we aim to ascertain the molecular nature of the protein quality control system for CLC-1. We identified three CLC-1-interacting proteins that are well-known heat shock protein 90 (Hsp90)-associated co-chaperones: FK506-binding protein 8 (FKBP8), activator of Hsp90 ATPase homolog 1 (Aha1), and Hsp70/Hsp90 organizing protein (HOP). These co-chaperones promote both the protein level and the functional expression of CLC-1 wild-type and A531V mutant. CLC-1 biosynthesis is also facilitated by the molecular chaperones Hsc70 and Hsp90β. The protein stability of CLC-1 is notably increased by FKBP8 and the Hsp90β inhibitor 17-allylamino-17-demethoxygeldanamycin (17-AAG) that substantially suppresses cullin 4 expression. We further confirmed that cullin 4 may interact with Hsp90β and FKBP8. Our data are consistent with the idea that FKBP8 and Hsp90β play an essential role in the late phase of CLC-1 quality control by dynamically coordinating protein folding and degradation
    • …
    corecore